AITopics | zero-shot classification

Collaborating Authors

zero-shot classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Experiment on zero-shot classification

Neural Information Processing SystemsFeb-15-2026, 19:20:32 GMT

The top two rows show easy cases, while the bottom three rows present hard cases, including crowdedness, complex backgrounds, and tiny objects.

artificial intelligence, large language model, natural language, (17 more...)

Neural Information Processing Systems

Country: Africa > Mali (0.05)

Industry: Information Technology (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)

Add feedback

Fine-Grained Zero-Shot Learning with DNA as Side Information

Neural Information Processing SystemsFeb-10-2026, 08:45:30 GMT

Fine-grained zero-shot learning task requires some form of side-information to transfer discriminative information from seen to unseen classes.

information, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Indiana > Marion County > Indianapolis (0.05)
North America > United States > Massachusetts > Middlesex County > Natick (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
(2 more...)

Genre: Research Report (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

254404d551f6ce17bb7407b4d6b3c87b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 15:16:38 GMT

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.66)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
(2 more...)

Add feedback

Topological Alignment of Shared Vision-Language Embedding Space

You, Junwon, Kang, Dasol, Jung, Jae-Hun

arXiv.org Artificial IntelligenceDec-12-2025

Contrastive Vision-Language Models (VLMs) have demonstrated strong zero-shot capabilities. However, their cross-modal alignment remains biased toward English due to limited multilingual multimodal data. Recent multilingual extensions have alleviated this gap but enforce instance-level alignment while neglecting the global geometry of the shared embedding space. We address this problem by introducing ToMCLIP (Topological Alignment for Multilingual CLIP), a topology-aware framework aligning embedding spaces with topology-preserving constraints. The proposed method applies persistent homology to define a topological alignment loss and approximates persistence diagram with theoretical error bounds using graph sparsification strategy. This work validates the proposed approach, showing enhanced structural coherence of multilingual representations, higher zero-shot accuracy on the CIFAR-100, and stronger multilingual retrieval performance on the xFlickr&CO. Beyond VLMs, the proposed approach provides a general method for incorporating topological alignment into representation learning.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.10889

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

CPEP: Contrastive Pose-EMG Pre-training Enhances Gesture Generalization on EMG Signals

Cui, Wenhui, Sandino, Christopher, Pouransari, Hadi, Liu, Ran, Minxha, Juri, Zippi, Ellen, Verma, Aman, Sedlackova, Anna, Azemi, Erdrin, Mahasseni, Behrooz

arXiv.org Artificial IntelligenceDec-3-2025

Hand gesture classification using high-quality structured data such as videos, images, and hand skeletons is a well-explored problem in computer vision. Leveraging low-power, cost-effective biosignals, e.g. surface electromyography (sEMG), allows for continuous gesture prediction on wearables. In this paper, we demonstrate that learning representations from weak-modality data that are aligned with those from structured, high-quality data can improve representation quality and enables zero-shot classification. Specifically, we propose a Contrastive Pose-EMG Pre-training (CPEP) framework to align EMG and pose representations, where we learn an EMG encoder that produces high-quality and pose-informative representations. We assess the gesture classification performance of our model through linear probing and zero-shot setups. Our model outperforms emg2pose benchmark models by up to 21% on in-distribution gesture classification and 72% on unseen (out-of-distribution) gesture classification.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.04699

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)

Add feedback

Model Merging Improves Zero-Shot Generalization in Bioacoustic Foundation Models

Marincione, Davide, Crisostomi, Donato, Dessi, Roberto, Rodolà, Emanuele, Rossi, Emanuele

arXiv.org Artificial IntelligenceNov-20-2025

Foundation models capable of generalizing across species and tasks represent a promising new frontier in bioacoustics, with NatureLM being one of the most prominent examples. While its domain-specific fine-tuning yields strong performance on bioacoustic benchmarks, we observe that it also introduces trade-offs in instruction-following flexibility. For instance, NatureLM achieves high accuracy when prompted for either the common or scientific name individually, but its accuracy drops significantly when both are requested in a single prompt. We address this by applying a simple model merging strategy that interpolates NatureLM with its base language model, recovering instruction-following capabilities with minimal loss of domain expertise. Finally, we show that the merged model exhibits markedly stronger zero-shot generalization, achieving over a 200% relative improvement and setting a new state-of-the-art in closed-set zero-shot classification of unseen species.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.05171

Country: North America > United States (0.69)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Language as a Label: Zero-Shot Multimodal Classification of Everyday Postures under Data Scarcity

Tang, MingZe, Jacob, Jubal Chandy

arXiv.org Artificial IntelligenceOct-16-2025

Recent Vision-Language Models (VLMs) enable zero-shot classification by aligning images and text in a shared space, a promising approach for data-scarce conditions. However, the influence of prompt design on recognizing visually similar categories, such as human postures, is not well understood. This study investigates how prompt specificity affects the zero-shot classification of sitting, standing, and walking/running on a small, 285-image COCO-derived dataset. A suite of modern VLMs, including OpenCLIP, MetaCLIP 2, and SigLip, were evaluated using a three-tiered prompt design that systematically increases linguistic detail. Our findings reveal a compelling, counter-intuitive trend: for the highest-performing models (MetaCLIP 2 and OpenCLIP), the simplest, most basic prompts consistently achieve the best results. Adding descriptive detail significantly degrades performance for instance, MetaCLIP 2's multi-class accuracy drops from 68.8\% to 55.1\% a phenomenon we term "prompt overfitting". Conversely, the lower-performing SigLip model shows improved classification on ambiguous classes when given more descriptive, body-cue-based prompts.

classification, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.13364

Genre: Research Report > New Finding (0.48)

Technology: